Osaka Kyoiku University at NTCIR-10 CrossLink-2: Link Filtering by Title Tag of Corpus as a Dictionary

نویسنده

  • Takashi Sato
چکیده

Our group (OKSAT) submitted two types of runs named SMP and REF for every subtasks of NTCIR-10 Cross-lingual Link Discovery (CLLD). Our method uses titles in Wikipedia pages (corpus) of source language as a entries of a dictionary, so no external dictionary is required. For SMP, we aimed to discover cross-lingual links of actual Wikipedia, in other words it targets Wikipedia ground truth. For REF, on the other hand, we aimed to discover as much meaningful cross-lingual links as possible automatically.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

NTCIR-5 WEB Navi-2 Experiments at Osaka Kyoiku University - Page, Anchor and Title Indexing, and In-link Count, Inter Page and Inter Site Link Analyses

This paper describes experimental results of WEB Navigational Retrieval Subtask 2 (WEB Navi-2). We made three gram-based indices, namely indices for text in whole page, text in title tag and text in anchor tag. Since gram-based indices are able to index all strings in target text, words that are not found in dictionaries are also indexed essentially. We used words in TITLE tag of search topics ...

متن کامل

NTCIR-4 PATENT Experiments at Osaka Kyoiku University - Gram-Based Passage Index and Essential Words

Long gram-based indices are experimented at NTCIR-4 patent task. No morphological analyses are required to make gram-based indices. The ABJ and DEJ tag fields are extracted and indexed from NTCIR-4 patent corpus. Passages are extracted and indexed also. The total index size is 240Gbyte and time to make indices is about 86 hours. By merging the result of passage retrieval with the result of docu...

متن کامل

NTCIR-3 PAT Experiments at Osaka Kyoiku University: Long Gram-based Index and Essential Words

Long gram-based indices are experimented at NTCIR-3 patent task. To make gram-based indices, no analyses such as morphological ones are required. The docno, abj, clj and dej tag fields are extracted from NTCIR-3 patent corpus. The total index size is 11.4Gbyte and time to make indices is about 8.7 hours. Median search time per word from abj and dej index is 9.8msec and 91.8msec respectively. Av...

متن کامل

NTCIR-6 CLIR Experiments at Osaka Kyoiku University - Term Expansion Using Online Dictionaries and Weighting Score by Term Variety

This paper describes experimental results of J-J subtask of NTCIR-6 CLIR. We expanded query term using online dictionaries in a WEB. It was effective for some topics of which average precision was low. Probabilistic model were employed for scoring, and we modified this score multiplying by the number of varieties of query terms, also. In most cases this works well. Query term reduction should b...

متن کامل

NTCIR-4 WEB Experiments at Osaka Kyoiku University - Static/Dynamic Scoring Using Link Structure Analysis and Web Page Grouping

We did gram-based indexing and the retrieval with NTCIR-4 WEB task. The time required to make indices are 34.7 hours. The size of indices is 30.2Gbyte. The median of retrieval time par word is 26msec. The ranking algorithm of retrieval results is based on a traditional probabilistic model. We report on the result of gram-based indexing and the retrieval, and propose a scoring method based on li...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013